Meta AI's HyperAgents performs metacognitive self-correction that optimizes improvement strategies themselves. Self-improvement appears in four non-coding domains, and strategies learned in one domain transfer to another, along with spontaneously acquired persistent memory.
H Company's Holotron-12B uses a memory-efficient new design to lift PC-operation AI throughput to 8,900 tokens per second. Unsloth has released the beta of 'Studio,' a browser tool for no-code model fine-tuning.
OpenAI acquired AI security evaluation platform Promptfoo, and Microsoft announced that Anthropic's Claude Cowork would be integrated into Microsoft 365 Copilot. The structure of the enterprise AI market is starting to change.
Andrej Karpathy released Autoresearch, a system where an AI agent autonomously runs machine-learning experiments on a GPU and tries 100 variants overnight. The article breaks down the mechanism and design so even readers with zero ML background can follow.
Trend Micro analyzed a new AMOS distribution method that targets AI agent workflows. A malicious SKILL.md on OpenClaw plants fake CLI install instructions and uses AI as the intermediary to manipulate people.
Techniques and defenses from the MINJA, InjecMEM, and ToxicSkills campaigns that poison AI agents’ memory files, and the fact that GPT-5.3-Codex achieved a 72% exploit success rate on EVMbench released by OpenAI and Paradigm. This article organizes how AI becomes both a target of attacks and a weapon for attackers.
Stripe Minions, Amazon Kiro, Claude Code compaction, and a Replit DB deletion. We synthesize multiple cases to extract the design principles required to operate AI coding agents in production, and organize them alongside CodeRabbit's 470‑repo statistics plus efforts from Google and GitHub.
Andrej Karpathy coined "Claws" as an upper layer for AI agents, and June Kim answered the same question from a different angle with the Cord framework implemented with MCP and SQLite. This piece organizes the shift from single-shot agents to autonomous coordination systems from both conceptual and implementation perspectives.
Kiro autonomously deleted production, causing 13 hours of AWS downtime; Claude Code’s auto-compaction irreversibly erases context; sub-agents silently burn through usage. Three incident reports from the same week.
Stripe’s Minions agent generates 1,300+ PRs per week with zero human effort. Implementation details of the four components: Devbox, Blueprints, Toolshed, and a fork of goose.
Using IBM and UC Berkeley's IT-Bench benchmark and the MAST failure taxonomy, this article examines why enterprise AI agents fail. It covers the reality of 11% SRE success and 0% FinOps success, plus the Replit production database deletion incident.